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1.  Introduction 


For  certain  reliability  studies,  the  objective  is  to  eonduct  trials  of  a  pass-fail  system  under 
identical  conditions  with  the  desired  objeetive  of  showing  that  the  reliability  of  the  system  can  be 
shown  to  be  at  some  minimum  speeification  level  with  a  predetermined  level  of  confidence.  A 
natural  question  is  “How  many  trials  are  neeessary  to  meet  this  specification?” 

In  mathematical  language,  we  seek  the  sample  size  N,  such  that  if  X  of  the  samples  are 
sueeessfully  tested,  then  a  (1-a)  100%  lower  confidence  bound  (LCB)  for  the  probability  of 
success  (p)  is  at  least  y.  Prior  to  the  test,  it  must  be  clearly  understood  what  constitutes  a  passing 
trial.  The  values  of  1-a  and  y  should  also  be  agreed  upon  before  testing  eommences.  If  the 
system  is  one  in  which  the  probability  of  success  is  desired  to  be  high,  then  the  value  of  y  should 
also  be  close  to  one.  The  value  of  1-a,  refleeting  the  eonfidence  level,  should  also  be  fairly  high. 
In  studies  where  the  eost  of  eonducting  eaeh  trial  is  expensive,  it  may  be  critical  to  keep  the 
number  of  tests  to  a  minimum.  One  way  to  minimize  testing  is  to  incorporate  a  zero-failure 
policy  whereby  each  trial  must  be  suceessful  in  order  for  the  reliability  standard  to  be  met. 

Now  consider  a  specific  problem  of  this  nature  relating  to  a  zero-failure  study  of  the  reliability  of 
armor  packages  in  defeating  a  prescribed  threat.  For  this  study,  both  1-a  and  y  are  set  at  90%, 
meaning  that  based  upon  the  results  of  N  suecessful  trials,  we  wish  to  eonelude  with  90% 
confidenee  that  the  armor  package  is  eapable  of  defeating  the  threat  with  a  minimal  probability 
of  90%.  The  question  of  interest  is  “What  is  the  value  of  N  that,  if  no  failures  are  observed, 
allows  us  to  meet  this  90/90  reliability  speeifieation?” 

The  answer  to  this  question  is  determined  by  examining  how  the  true  but  unknown  success 
probability  is  estimated.  This  straightforward  estimation  problem  has  been  the  subjeet  of 
researeh  for  nearly  two  eenturies.  Many  solutions  have  been  proposed  to  this  problem  whieh 
eontinues  to  draw  interest  today. 


2.  Interval  Estimation  of  Binomial  Proportions 


The  armor  reliability  problem  is  tantamount  to  estimating  the  parameter  of  a  binomial 
distribution.  The  binomial  distribution  is  used  to  model  the  number  of  successes  (X)  out  of  N 
Bernoulli  trials  when  the  probability  of  success  is  p.  If  all  N  trials  are  suecessful,  then  X  =  N 
and  the  maximum  likelihood  estimate  for  is  p  =  \.  This  point  estimate  is  not  very 
enlightening,  since  it  conveys  no  information  on  the  variability  associated  with  p  .  As  long  as 
no  failures  are  observed,  the  same  value  for  p  is  returned  whether  2  or  2,000,000  trials  are 
conducted.  Intuitively,  as  the  value  of  N  inereases,  we  are  mueh  more  likely  to  believe  that  the 
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true  value  of  p  is  close  to  1 .  What  we  prefer  to  report  with  some  degree  of  confidence  is  a  range 
of  plausible  values  for  p,  i.e.,  an  interval  estimate  for  the  true  probability  of  success. 

When  p  is  believed  to  be  high,  there  is  usually  little  interest  in  placing  an  upper  confidence  limit 
on  its  value.  However,  a  lower  limit,  known  as  an  LCB  for p,  is  useful  since  the  LCB  represents 
a  conservative  estimate  on  the  probability  of  success.  Because  the  LCB  is  based  on  the  random 
sample  of  Bernoulli  trials,  it  too  is  a  random  variable.  The  LCB  is  a  function  of  X  and  N  which 
satisfies  the  probability  statement 

\-a=P[LCB{X-,N)<p).  (1) 


Many  authors  have  proposed  methods  for  calculating  LCBs  (and  confidence  intervals)  for  the 
binomial  parameter  p.  In  the  ensuing  subsections,  we  introduce  three  of  them,  and  calculate  the 
required  sample  size  for  a  zero-failure  test  that  satisfies  the  90/90  reliability  specification. 


2,1  The  Clopper-Pearson  Method 


The  Clopper-Pearson  (1934)  method  for  binomial  confidence  intervals  is  popular  for  its  relative 
ease  to  calculate.  In  general,  the  confidence  interval  limits, ,  are  solutions  to  the 

.  .A,-,-  « 
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4,(l-i„r'=^andXr'  IfaHeast 

z  V  W  z 


V  ‘  J 


one  success 


and  one  failure  are  observed  among  the  samples,  both  endpoints  of  the  interval  can  be  expressed 
as  functions  of  percentiles  from  F  distributions.  For  an  LCB,  with  X  =  N ,  the  calculation 
simply  reduces  to  L^p  =  .  Clopper-Pearson  intervals  are  often  referred  to  as  “exact”  intervals 


since  they  are  derived  from  exact  probability  statements  and  not  any  distributional 
approximations.  As  such,  the  Clopper-Pearson  is  often  touted  in  introductory  statistics 
textbooks. 


To  satisfy  the  90/90  reliability  specification  using  the  Clopper-Pearson  method,  we  seek  the 

1/ 

minimum  value  of  N  which  satisfies  .90  <  .10^^  .  Taking  the  logarithm  of  both  sides  of  this 

inequality,  we  have  ln(.90)  <  — ^  which  leads  to  the  solution  N  >  ln(.l)/ln(.9)  =  21.85  . 

Since  N  must  be  an  integer,  the  number  of  zero-failure  trials  required  to  meet  the  90/90 
specification  under  the  Clopper-Pearson  method  is  rounded  up  to  22. 

2.2  Wilson  Score  Method 

The  method  developed  by  Wilson  (1927)  is  based  on  an  inversion  of  the  score  test  for  p,  and 
results  in  a  more  complex  formula  for  the  limits  of  the  confidence  interval: 
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where  Zg  is  the  value  having  an  area  of  d  to  its  right  under  the  standard  normal  eurve.  However, 

N 

with  no  failures,  the  formula  for  an  LCB  reduees  to  LCB  = - ^  . 

N  +  zl 

To  satisfy  the  90/90  reliability  speeifieation  using  the  Wilson  seore  method,  we  seek  the 

N 

minimum  value  of  whieh  satisfies  .90  < - The  value  of  z,„«1.2816  ean be  found  in 

most  statisties  textbooks.  After  a  few  algebraie  manipulations,  we  get  a  solution  of  A  >  9z^ ,  or 
N  >  14.78  .  Sinee  N  must  be  an  integer,  the  number  of  zero-failure  trials  required  to  meet  the 
90/90  speeifieation  under  the  Wilson  seore  method  is  rounded  up  to  15. 

2,3  Jeffreys  Method 

The  final  interval  eonstruetion  method  that  we  examine  is  Jeffreys  method,  first  proposed  by 
Rubin  and  Sehenker  (1987).  This  method  is  based  on  a  Bayesian  estimate  for  the  sueeess 
probability,  whereby  p  is  not  eonsidered  a  fixed  unknown  parameter  but  rather  a  random 
variable.  The  prior  distribution  proposed  by  Jeffreys  method  is  a  beta  distribution  with  both 
parameters  set  to  0.5;  and  the  posterior  distribution  of p  is  given  by  a  beta  distribution  with 
parameters  X  +  0.5  and  A  -  X  +  0.5  .  Note  that  these  two  parameters  are  the  number  of 
sueeesses  and  the  number  of  failures,  both  of  whieh  are  then  inereased  by  1/2.  The  eonfidenee 
interval  endpoints  are  the  values  within  the  support  of  the  posterior  distribution  that  define  the 

lower  and  upper  a/2  pereentiles.  For  the  eonstruetion  of  an  LCB,  we  have 

LCB  =  BetaCDF-'i^a,X  +  j^,N-X  +  j^y  (3) 

In  our  speeifie  ease  of  X=N,  the  LCB  is  the  value  within  the  support  of  a  beta  distribution  with 
parameters  A +  0.5  and  0.5  whose  eumulative  distribution  funetion  equals  a. 

To  satisfy  the  90/90  reliability  speeifieation  using  Jeffreys  method,  we  seek  the  minimum  value 
of  A  whieh  satisfies  the  inequality 

.90<5etoCZ)F ‘(.10,A  +  j^,j^).  (4) 

Although  a  elosed-form  solution  is  not  tenable,  analytie  software  eapable  of  ealeulating  the 
inverse  of  a  beta  eumulative  distribution  funetion  (CDF)  (e.g.,  MATLAB)  is  used  to 
obtain  A  >  12.58  .  Sinee  A  must  be  an  integer,  the  number  of  zero-failure  trials  required  to  meet 
the  90/90  speeifieation  under  Jeffreys  method  is  13. 
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3.  Comparison  of  Different  LCB  Methods 


A  (1-a)  100%  confidence  interval  is  usually  interpreted  as  a  range  of  values  which  contains  the 
true  but  unknown  parameter  value  with  probability  1-a.  So,  for  example,  if  1000  independent 
data  sets  are  used  to  generate  95%  confidence  intervals  for  some  parameter,  we  can  expect  about 
950  of  them  to  eontain  that  parameter  value  p.  However,  when  a  confidence  interval  is  based  on 
approximate  distributional  theory  and/or  discrete  distributions,  the  actual  “coverage  probability,” 
or  proportion  of  time  that  the  interval  contains  p,  might  not  equal  1-a.  Furthermore,  the  coverage 
probability  may  depend  upon  the  value  of  the  parameter  p.  Coverage  probabilities  exceed  the 
nominal  coverage  probability  1-a  when  the  confidence  intervals  are  unnecessarily  wide;  we  say 
that  such  intervals  are  conservative.  On  the  other  hand,  if  the  confidence  intervals  tend  to  be  too 
narrow,  the  eoverage  probabilities  will  be  less  than  the  nominal  value  (1-a). 

To  compare  various  confidence  intervals,  we  need  to  examine  their  coverage  probabilities. 
Several  authors  (Ghosh,  1979;  Blyth  and  Still,  1983;  Agresti  and  Coull,  1998;  Brown  et  ah, 

2001)  have  done  just  this  in  studying  the  coverage  probabilities  of  two-sided  confidence  intervals 
for  the  binomial  parameter  p.  Agresti  and  Coull,  in  particular,  found  that  the  exaet  method  is 
highly  conservative  and  noted  that  the  Wilson  score  method  results  in  actual  coverage 
probabilities  near  the  nominal  level;  however,  they  did  not  included  Jeffreys  method  in  their 
paper.  Cai  (2005)  points  out  that  good  performance  in  terms  of  two-sided  interval  coverage 
probabilities  does  not  necessarily  guarantee  that  a  method  will  perform  similarly  for  one-sided 
intervals.  Cai  compared  the  Jeffreys  and  Wilson  score  methods,  along  with  other  candidate 
methods,  for  the  coverage  probabilities  of  99%  upper  eonfidenee  bounds. 

In  this  section,  we  focus  on  the  eoverage  probabilities  of  90%  lower  confidence  bounds  for  the 
binomial  parameter  using  the  three  methods  outlined  in  section  2.  Monte-Carlo  simulation  was 
utilized  to  estimate  the  coverage  probabilities.  For  a  given  N  and  p,  a  large  number  of  binomial 
observations  are  randomly  generated.  Then  using  each  of  the  three  methods,  the  LCBs  are 
calculated.  The  estimated  coverage  probabilities  are  equal  to  the  frequency  with  which  the  LCBs 
are  less  than  p. 

Figure  1  shows  the  actual  coverage  probability  (based  on  100,000  simulated  binomial  draws)  as 
a  function  of  the  probability  of  sueeess, />,  for  the  Clopper-Pearson,  Wilson  seore,  and  Jeffreys 
intervals  when  the  sample  size  is  10,  25,  and  50.  The  probability  of  success  is  limited  to  values 
above  0.7  since  the  application  of  this  study  is  to  a  system  whose  p  is  relatively  high. 

The  most  obvious  feature  of  each  of  these  plots  is  the  saw-toothed  relationship  between  p  and 
coverage  probability,  an  artifaet  of  the  discreteness  of  the  binomial  distribution.  Also  worth 
noting  in  figure  1  is  that  for  any  choice  of  a,  N,  and  LCB  construction  method,  there  is  an  entire 
interval  of  values  for  which  the  coverage  probability  is  100%.  It  so  happens  that  p*  is 
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Figure  1 .  A  comparison  of  coverage  probabilities  vs.  true  binomial  parameter  for  the  nominal  90% 

LCB. 

the  LCB  for  a  zero-failure  experiment;  e.g.,  for  the  Clopper-Pearson  method,  p*(,p  =  ;  for 

Wilson  score,  =  N and  for  Jeffreys,  p]  =  BetaCDF  ^  (a,  +  .5, .5) .  For  most 

practical  values  of  a  and  N,  including  those  in  figure  1,  smaller  than  either  p]^^  or  . 

With  coverage  probabilities  always  exceeding  (or  at)  the  nominal  value,  the  Clopper-Pearson 
method  is  clearly  the  most  conservative  of  the  three  methods.  Both  the  Wilson  score  and 
Jeffreys  LCBs  have  coverage  probabilities  which  can  be  greater  than  or  less  than  the  nominal 
coverage  probability  depending  upon  the  true  value  of p.  The  oscillation  about  90%  nominal 
coverage  is  slightly  less  under  the  Wilson  score  method. 

Since  the  selection  of  a  “best”  method  for  constructing  LCBs  is  dependent  on  p,  the  concept  of 
mean  coverage  probability  over  the  range  of  possible  values  of  p  allows  us  to  judge  which 
method  on  average  is  preferred.  Sampling  values  of  p  from  a  uniform  distribution  over  the  range 
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of  interest  (i.e.,  [0.7,  1.0])  and  then  averaging  the  sample  of  associated  coverage  probabilities  is 
the  Monte-Carlo  equivalent  of  integrating  the  />-versus-coverage-probability  curve.  This  is 
exactly  what  is  done  to  produce  figure  2a  for  =  5 :5 : 1 00  and  generating  250,000  values  from 
each  distribution.  We  see  that  on  average  all  three  methods  are  conservative  to  some  degree. 
The  most  conservative  method  is  Clopper-Pearson,  followed  by  Wilson  score  and  then  Jeffreys. 

Instead  of  assuming  that  the  values  of  p  are  equally  likely  to  be  between  0.7  and  1.0,  one  can 
assume  that  p  follows  some  other  distribution.  A  natural  choice  is  the  beta  distribution  since  its 
support  is  on  the  interval  [0,  1].  In  figure  2b,  we  assume  that  the  success  probability  is  a  beta 
random  variable  with  parameters  13.6  and  2.4.  These  parameters  were  chosen  to  match  the  first 
two  moments  of  a  uniform  (0.7,  1.0)  distribution.  This  results  in  slightly  more  conservative 
coverage  of  Wilson  score  and  Jeffrey  LCBs  for  very  small  sample  sizes  (N  =  5);  however,  as  N 
increases,  the  bias  in  coverage  probability  for  these  two  LCBs  is  more  quickly  reduced. 


Figure  2.  Mean  coverage  probability  as  a  function  ofW  for  the  nominal  90%  Clopper-Pearson  (E),  Wilson  score 

(W),  and  Jeffreys  (J)  LCBs,  when  p  has  (a)  a  uniform  distribution  on  (.7,  1)  and  (b)  a  beta  distribution  with 
parameters  13.6  and  2.4. 

These  two  figures  corroborate  the  ultra-conservative  nature  of  Clopper-Pearson  LCBs  and  lead 
us  to  conclude  that  a  22-trial  study  is  unnecessary.  As  long  as  no  failures  are  observed,  both  a 
15-trial  study  using  Wilson  score  LCBs  and  a  13-trial  study  using  Jeffreys  LCBs  will  allow  us  to 
conclude  that  the  90/90  reliability  specification  has  been  met.  Because  the  Wilson  score  method 
requires  two  additional  successful  trials  to  reach  this  same  conclusion,  it  is  slightly  more 
conservative. 
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4.  Sample  Size  Requirements  for  Nonzero  Failure  Testing 

If  the  cost  of  testing  is  not  too  excessive,  a  small  number  of  failures  may  be  permissible  among 
the  test  trials.  In  this  section,  we  determine  the  sample  size  requirements  for  tests  that  permit 
either  1  or  2  failures,  and  still  allow  the  90/90  reliability  criteria  to  be  met.  We  do  so  for  both  the 
Wilson  score  and  Jeffreys  methods. 

4,1  Wilson  Score  Method  With  One  Failure 

To  calculate  the  necessary  sample  size,  we  consider  the  lower  limit  of  equation  2,  setting 
X  =  N -\  and  a  =  0. 10  to  get  the  inequality 


A  closed- form  solution  for  N  is  not  tractable,  however  using  trial  and  error,  one  can  show  that 
N  =  32  is  the  minimum  sample  size  which  meets  the  90/90  reliability  specification  in  a  one- 
failure  test  using  the  Wilson  score  method. 

4,2  Wilson  Score  Method  With  Two  Failures 

Similarly,  setting  X  =  N-2  in  the  lower  limit  of  equation  2,  we  get 


To  meet  a  90/90  reliability  specification,  the  minimum  sample  size  necessary  for  a  two-failure 
test  using  the  Wilson  score  method  is  N  =  41 . 

4,3  Jeffreys  Method  With  One  Failure 

To  calculate  the  necessary  sample  size,  we  consider  the  lower  limit  of  equation  3,  setting 
X  =  N-\  and  a  =  0. 10  to  get  the  equation 

.9<BetaCDF-^i^.lO,N-y^,y^y  (7) 


7 


Again,  a  closed  form  expression  for  the  minimum  value  of  N  does  not  exist.  However,  using 
statistieal  software  eapable  of  ealeulating  the  inverse  CDF  for  a  beta  distribution,  we  determine 
that  the  minimum  sample  size  neeessary  to  meet  a  90/90  reliability  speeifieation  in  a  one-failure 
test  using  the  Jeffreys  method  is  N  =  30. 

4,4  Jeffreys  Method  With  Two  Failures 

Similarly,  setting  X  =  N-2  in  the  lower  limit  of  equation  3,  we  get 

.9<BetaCDF-^{^.lO,N-^,^yQ.  (8) 

To  meet  a  90/90  reliability  speeifieation,  the  minimum  sample  size  neeessary  for  a  two-failure 
test  using  the  Jeffreys  method  is  N  =  45 . 

To  summarize  this  seetion,  the  required  sample  sizes  for  90/90  tests  using  various  failure 
allowanees  and  LCB  eonstruetion  methods  are  shown  in  table  1 . 


Table  1 .  Minimum  number  of  trials  required  to  test  for  a  90/90  reliability  standard. 


LCB  Method 

Clopper-Pearsou 

Wilsou  score 

Jeffreys 

Number  of 
Failures  Allowed 

0 

22 

15 

13 

1 

38 

32 

30 

2 

52 

47 

45 

5.  Multistage  Sampling  Plans 


For  any  of  the  aforementioned  tests,  it  should  be  obvious  that  a  90/90  test  may  be  terminated 
onee  the  number  of  allowable  failures  is  exeeeded.  This  may  oeeur  if  the  quality  of  the  produet 
is  poor;  or  if  the  product  is  of  satisfactory  quality  but  suffers  from  poor  luek  during  the  test. 

Now  eonsider  the  stopping  of  tests  prematurely  for  exeeptional  quality  by  utilizing  a 
“multistage”  sampling  strategy.  For  example,  suppose  that  one  is  using  the  Wilson  method  with 
one  allowable  failure.  The  test  would  normally  eall  for  a  sample  size  of  32.  However,  if  we 
observe  suceesses  in  eaeh  of  the  first  15  trials,  then  we  ean  stop  the  test  prematurely  sinee  at  this 
point  the  90/90  reliability  speeifieation  is  met.  Sueh  a  testing  strategy  we  refer  to  as  a  two-stage 
test.  In  a  three-stage  test  using  the  Wilson  seore  method,  we  stop  if 

1.  Zero  failures  are  observed  in  the  first  stage  (the  first  15  trials); 

2.  Only  one  failure  is  observed  among  first  and  seeond  stages  (the  first  32  trials); 

3.  Three  failures  are  observed  at  any  point  in  the  test;  or 
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4.  All  three  stages  are  completed,  i.e.,  all  47  trials  have  been  run. 

Graphically,  we  can  display  the  sampling  protocol  for  a  multistage  test  using  the  technique  of 
figure  3,  which  is  drawn  for  our  90/90  test  using  the  Wilson  score  method.  (Note  that  a  one- 
stage  test  is  synonymous  with  a  zero-failure  test.)  In  this  type  of  chart,  the  progression  of  the  test 
is  mapped  as  a  function  of  the  number  of  trials  and  the  number  of  observed  trials.  Landing  on 
either  a  square  or  circle  terminates  the  test — a  square  indicates  that  the  number  of  allowable 
failures  has  been  exceeded,  whereas  a  circle  indicates  that  the  90/90  reliability  specification  is 
met.  Landing  on  a  dot  means  that  testing  should  continue.  Sampling  protocol  charts  under  the 
Jeffreys  method  are  not  included  in  this  document. 


(a) 


(b) 


(c) 


rtS 

Lj_ 


2-1  □□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ 

- 1 - 1 - 1 - 1 - 1 - 1 - 
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20  25  30 
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Figure  3.  Sampling  protocol  charts  for  a  90/90  (a)  one-stage  test,  (b)  two-stage  test,  and  (c)  three-stage  test, 
each  using  the  Wilson  score  method  for  estimating  the  LCB.  Travel  starts  at  (0,0)  and  moves  one 
unit  to  the  right  with  each  trial,  and  an  additional  unit  upward  if  that  trial  is  a  failure.  If  the  current 
test  “position”  is  on  a  dot,  then  testing  continues.  If  the  current  test  position  is  on  a  square,  then 
testing  stops  without  meeting  the  90/90  reliability  specification.  If  the  current  test  position  is  on  a 
circle,  then  testing  stops  with  the  90/90  reliability  specification  satisfied. 

5,1  Operating  Characteristic  Curves 

Assuming  that  the  probability  of  success  is  known,  then  the  probability  of  meeting  the  90/90 
reliability  specification  can  be  calculated.  For  example,  in  a  one-stage  test  consisting  of  up  to 
Aj  trials,  this  is  simply  the  probability  that  all  trials  are  successes,  p^' .  In  a  two-stage  test  of  up 
to  Aj  trials,  the  reliability  specification  is  satisfied  if  either  (1)  all  of  the  Aj  first-stage  trials  are 
successes,  or  (2)  if  exactly  one  of  the  first-stage  trials  is  a  failure  AND  all  of  the  Aj  -  Aj  second- 
stage  trials  are  successes.  The  probability  of  this  event  is 
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(9) 


P{meet  spec)  =  P(^aU  S  in  trials  \to  N^) 

+  P{\F  in  trials  \to  N^)xP{^all  S  in  trials  N^+\to  N^) 


=  1  4- 


viy 


N,-\ 


{\-p)p 


N.-N-F\ 


The  general  formula  for  the  probability  of  meeting  the  90/90  reliability  speeifieation  in  a  three- 
stage  test  is  more  eomplex  and  not  shown  here.  However,  this  probability  as  a  funetion  of  the 
pereent  defeetive  (probability  of  failure  for  any  individual  trial)  is  plotted  in  figure  4.  As 
expeeted,  the  relationship  is  deereasing  and  the  probability  of  suoeessfully  meeting  the  90/90 
reliability  eriteria  inereases  as  more  stages  (and  henee  more  trials)  are  added.  These  plots  are 
akin  to  operating  eharaeteristie  eurves  frequently  shown  in  quality-eontrol  eireles,  and  are 
helpful  in  providing  an  a  priori  estimate  for  the  probability  of  a  suoeessful  test. 

For  example,  figure  4a  shows  that  even  if  the  true  sueeess  probability  is  95%  (5%  defeetive  rate 
of  5%),  there  is  only  a  46%  ehanee  that  a  15-trial  one-stage  test  using  Wilson  seore  LCBs  will 
result  in  all  sueeesses  and  meet  the  90/90  reliability  speeifieation.  The  same  material  in  a  13- 
trial  one-stage  using  Jeffrey  LCBs  only  has  a  51%  ehanee  of  meeting  the  speeifieation.  This 
highlights  the  risk  of  eondueting  a  small-sample  test  to  pass  material  at  a  high  level  of 
performanee.  Use  of  a  two-  or  three-stage  test  will  improve  the  ehanee  that  high  quality  material 
is  found  to  meet  the  speeifieation,  but  at  the  priee  of  nearly  doubling  (or  tripling)  the  number  of 
trials. 


5,2  Sample  Size  Expectation 

Another  important  eharaeteristie  of  a  sampling  protoeol  is  the  expected  number  of  samples  until 
test  termination.  Because  a  multistage  test  can  be  stopped  early,  the  number  of  trials  conducted 
until  the  test  is  terminated  is  a  random  variable.  Therefore,  we  can  calculate  its  expected  value. 

For  example,  denoting  the  number  of  trials  conducted  in  a  one-stage  test  by  Mi,  its  expectation  is 

15 

£(M,)=y,'.p  (Mj  =i) .  Note  that  if  1  <  Mj  <  Aj  - 1 ,  then  the  test  is  stopped  early  because  trial 

i=\ 

Ml  is  a  failure  while  all  prior  trials  were  successes.  If  Mj  =  Aj ,  then  each  of  the  first  Aj  - 
was  a  success.  So, 

7V,-1 

E(Mi)  =  ^/.P(Mi  =/)  +  Ai»P(Mi  =Ai) 

i=\ 

N,-\ 

=  ^  i»P[all  S  in  trials  1  to  i-\,  then  F)  +  NpPi^all  S  in  trials  1  to  Aj  -l) 

i=\ 


trials 


(10) 
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Figure  4.  Relationship  between  percentage  of  defective  items  in  the  population  and  the  probability  of 
meeting  the  90/90  reliability  specification  using  (a)  Wilson  score  and  (b)  Jeffreys  methods  of 
computing  the  LCB  for  reliability. 


Each  of  the  probabilities  in  equation  9  can  be  evaluated  using  the  binomial  distribution,  leading 
to 

E[M,)  =  {\-p)f^ip'-^+Ny'-\  (11) 
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Substituting  for  the  summation  of  a  finite  series  in  the  left  addend,  we  get 


ir(M,)  =  (i-p) 


f  1  N,  -i,.T 

l-p  '  N^p  ' 


\-p^' 

\-p 


(12) 


While  the  details  are  omitted  here  for  brevity,  it  ean  be  shown  that  the  expeeted  number  of  trials 
condueted  in  a  two-stage  test,  M2,  is 


E{M,)=  ’-N,p 


w-i , 


(13) 


and  the  expected  number  of  trials  for  a  three-stage  test  equals 


E{M^)  =  3 


l-p 


-2Ny^-^-^{N,{N,-l){l-p)  +  {2N,-N,-l)p){l-p)p 


N,-3 


(14) 


Figure  5  displays  the  relationship  between  percent  defectives  and  the  expected  number  of 
samples  for  various  staged  sampling  protocols  and  LCB  methods.  In  one-stage  tests,  as  the 
percent  defectives  decreases  we  are  more  likely  to  carry  out  the  full  test  of  M  trials.  For  two- 
and  three-stage  tests,  this  same  principle  applies  up  to  a  point  at  which  the  decreasing  percent 
defectives  results  in  the  increased  application  of  early  stopping  rules. 
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Figure  5.  Percentage  of  defective  items  in  the  population  vs.  expected  number  of  samples  under  several 
sampling  protocols  using  (a)  the  Wilson  score  and  (b)  Jeffreys  methods  of  constructing  LCBs. 
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6.  Summary 


Over  the  years,  and  even  to  this  day,  many  statisties  texts  have  advoeated  use  of  the  Clopper- 
Pearson  method  for  interval  estimation  beeause  of  its  “exaetness.”  Henee,  many  zero-failure 
aeeeptanee  tests  have  been  eondueted  with  22  trials  with  the  intent  of  showing  at  least  90% 
reliability  with  90%  eonfidenee.  This  report  has  shown  that  this  sample  size  is  unneeessarily 
high.  It  is  possible  using  the  Wilson  seore  method  of  LCB  eonstruetion  to  show  this  same  level 
of  performanee  with  only  15  trials,  a  savings  of  32%  in  test  resourees.  The  savings  under  the 
Jeffreys  method  is  even  greater — only  13  trials  and  a  41%  eost  reduetion. 

Both  the  Wilson  seore  and  Jeffreys  methods  do  eome  with  some  risks.  On  average,  both 
methods  are  slightly  eonservative  in  terms  of  their  eoverage  probability;  however,  there  are  some 
values  of  p  for  whieh  the  aetual  eoverage  probability  of  a  90%  Wilson  seore  LCB  ean  be  eloser 
to  80%,  meaning  that  the  LCBs  tend  to  be  too  large.  Figure  4  reveals  that  for  large-sample  tests 
using  a  Jeffreys  LCB,  the  eoverage  probability  may  be  as  small  as  75%  for  a  limited  range  of 
sueeess  probabilities  near  97%,  meaning  that  there  is  nearly  a  l-in-4  ehanee  that  eertain  high- 
grade  material  may  not  meet  the  90/90  speeifieation. 

If  a  eonservative  estimate  of  at  least  90%  reliability  is  desired  from  our  test,  then  the  aetual 
probability  of  sueeess  for  the  system  should  be  substantially  greater  than  90%.  Figure  4  shows 
even  when  p  =  0.95  ,  there  is  only  about  a  45%  ehanee  of  eonfirming  that  the  system  is 
funetioning  at  the  desired  level  of  performanee  when  using  a  one-stage,  Wilson  seore -based  test. 
When  p  =  0.99 ,  this  ehanee  inereases  to  about  85%.  Beeause  fewer  tests  are  required  under  a 
Jeffreys-based  test  proeedure,  the  probability  of  meeting  the  90/90  speeifieation  are  higher  by 
about  7%  when  p  =  0.95  and  3%  when p  =  0.99 .  These  observations  are  made  to  point  out  that 
there  is  a  signifieant  risk  of  a  failing  to  pass  even  high-quality  material  under  a  small-sample  test 
protoeol  in  whieh  no  failures  are  allowed.  If  we  are  willing  to  eommit  to  a  larger  two-stage  test, 
we  stand  a  mueh  greater  ehanee  of  showing  that  good-quality  material  (e.g.,  armor  paekages) 
meets  the  reliability  speeifieation. 
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